Standardized Multilingual Language Resourcesfor the Web of Data

نویسندگان

  • Matthias Quasthoff
  • Sebastian Hellmann
  • Konrad Höffner
چکیده

Statistical knowledge on natural languages is inevitable for various kinds of services requiring Natural Language Processing (NLP) functionality, such as information retrieval. The NLP Group at the University of Leipzig started providing such statistical information for more than 50 languages in the Leipzig Corpora Collection (LCC) [1] more than a decade ago. Some of their corpora contain more than 5 million words and more than 300 million links between them, resulting in an accumulated size of about 60 million words and 814 million links in all corpora. So far, these valuable information could be accessed in a human-readable Web site and through a SOAP Web service, and excerpts of the data could be downloaded as SQL data dumps. A linked data interface for the LCC has now become desirable in order to allow a wider range of applications to make use of the corpora. In this report, the LCC linked data interface is presented. This new service provides information about almost 60 million resources in approximately 900 million triples. Additionally, links to other vocabulary such as WordNet [2] and to DBpedia [3] are offered. The service is realized using a customized version of D2R Server [4].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilinguality vs. Monolinguality among Kalhuri Kurdish Speakers: Gender, Social Class and English Language Achievement

Today in multilingual contexts, many parents prefer to rear their children in the dominant language rather than in their mother tongue. This phenomenon is widespread among native speakers of Kalhuri dialect of the Kurdish language in the multilingual context of Iran, too. Nevertheless, some studies have evidenced the privilege of bilinguals in learning an additional language though some others ...

متن کامل

Motivational Determinants of Code-Switching in Iranian EFL Classrooms

“Code-Switching”, an important issue in the field of both language classroom and sociolinguistics, has been under consideration in investigations related to bilingual and multilingual societies. First proposed by Haugen (1956) and later developed byGrosjean (1982), the termcode-switching refers to language alternation during communication. Although code-switching is unavoidable in bilingual and...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

How the Multilingual Semantic Web can meet the Multilingual Web

The success of the Web is not based on technology. It is rather based on the availability of tooling to create web content, the fast number of content creators providing content, and finally the users who eagerly “digest” the content and are willing to pay for it, being part of various business models. Not only the Web in general, but also the Multilingual Web is growing. More and more content ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009